Revisiting the Case for Explicit Syntactic Information in Language Models
نویسندگان
چکیده
Statistical language models used in deployed systems for speech recognition, machine translation and other human language technologies are almost exclusively n-gram models. They are regarded as linguistically naı̈ve, but estimating them from any amount of text, large or small, is straightforward. Furthermore, they have doggedly matched or outperformed numerous competing proposals for syntactically well-motivated models. This unusual resilience of n-grams, as well as their weaknesses, are examined here. It is demonstrated that n-grams are good word-predictors, even linguistically speaking, in a large majority of word-positions, and it is suggested that to improve over n-grams, one must explore syntax-aware (or other) language models that focus on positions where n-grams are weak.
منابع مشابه
Native-like Event-related Potentials in Processing the Second Language Syntax: Late Bilinguals
Background: The P600 brain wave reflects syntactic processes in response to different first language (L1) syntactic violations, syntactic repair, structural reanalysis, and specific semantic components. Unlike semantic processing, aspects of the second language (L2) syntactic processing differ from the L1, particularly at lower levels of proficiency. At higher L2 proficiency, syntactic violatio...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملThe Impact of Different Frequency Patterns on the Syntactic Production of a 6-year-old EFL Home Learner: A Case Study
This longitudinal study investigated the impact of different Frequency Patterns (FP) on the syntactic production of a six-year-old EFL learner in a home context. Target syntactic constructions were presented using games and plays and were traced for their occurrence patterns in input and output. Following each instruction period, the constructions were measured through immediate and delayed ora...
متن کاملSyntactic Properties of Language of Scientific Communication in Persian Scientific Works
Purpose: The language of science is one of the social types of Persian language, which is used by the educated classes in scientific works and contexts. The purpose of this research is to present an overall picture of the syntactic properties of the Persian scientific language. The types of sentences, types of tenses, verb tenses, and syntactic constructions have been identified in the scientif...
متن کاملRevisiting the Arabic Diglossic Situation and Highlighting the Socio-Cultural Factors Shaping Language Use in Light of Auer’s (2005) Model
In the field of Arabic sociolinguistics, diglossia has been an interesting linguistic inquiry since it was first discussed by Ferguson in 1959. Since then, diglossia has been discussed, expanded, and revisited by Badawi (1973), Hudson (2002), and Albirini (2016) among others. While the discussion of the Arabic diglossic situation highlights the existence of two separate codes (High and Lo...
متن کامل